尽管最近的研究集中在量化单词用法上以找到叙事情感弧的整体形状,但叙事中叙事的某些特征仍有待探索。在这里,我们通过找到单词用法中波动开始相关的文本长度来表征亚叙事的叙事时间尺度。我们代表30,000多个项目Gutenberg书籍作为时间序列使用OusiOmetrics,这是一个具有基本含义的功率破坏者框架,本身是对价价 - 宽松义务框架的重新解释,这些框架源自语义差异。我们使用经验模式分解将每本书的力量和危险时间序列分解为组成振荡模式和非振荡趋势的总和。通过将原始力量和危险时间序列的分解与从洗牌文本中得出的分解,我们发现较短的书籍仅显示出一般趋势,而较长的书籍除了一般趋势外,还具有波动,类似于子图在一个中的弧线中的弧线。总体叙事弧。这些波动通常有几千个单词的时期,无论书籍长度或库分类代码如何,但根据书的内容和结构而有所不同。我们的方法提供了一种数据驱动的denoisising方法,可用于各种长度的文本,与使用大型窗口尺寸的更传统的方法相反,该方法可能会无意中平滑相关信息,尤其是对于较短的文本而言。
translated by 谷歌翻译
情绪感知智能系统对于广泛的应用是必不可少的。这些系统由语言模型驱动,这主要落入两个范式:基于词汇和上下文。虽然最近的上下文模型越来越占主导地位,但由于它们的可解释性和易用性,我们仍然可以看到基于词汇的模型的需求。例如,基于词汇的模型允许研究人员容易地确定哪些单词和短语对测量情绪的变化有贡献。任何基于词汇的方法的挑战是,词典需要通过新的单词和表达进行常规扩展。在这里,我们提出了两个用于自动词典扩展的模型。我们的第一个模型建立了一种基线,采用简单而浅的神经网络,使用非上下文方法初始化了预先训练的单词嵌入。我们的第二种模式改进了我们的基线,具有深度变压器的网络,它带来了估计其词汇极性的单词定义。我们的评估表明,两种模型都能够以与亚马逊机械土耳其人的评论者相似的准确度,但是在成本的一小部分中,可以获得类似的准确性。
translated by 谷歌翻译
Visual object tracking under challenging conditions of motion and light can be hindered by the capabilities of conventional cameras, prone to producing images with motion blur. Event cameras are novel sensors suited to robustly perform vision tasks under these conditions. However, due to the nature of their output, applying them to object detection and tracking is non-trivial. In this work, we propose a framework to take advantage of both event cameras and off-the-shelf deep learning for object tracking. We show that reconstructing event data into intensity frames improves the tracking performance in conditions under which conventional cameras fail to provide acceptable results.
translated by 谷歌翻译
Both clustering and outlier detection play an important role for meteorological measurements. We present the AWT algorithm, a clustering algorithm for time series data that also performs implicit outlier detection during the clustering. AWT integrates ideas of several well-known K-Means clustering algorithms. It chooses the number of clusters automatically based on a user-defined threshold parameter, and it can be used for heterogeneous meteorological input data as well as for data sets that exceed the available memory size. We apply AWT to crowd sourced 2-m temperature data with an hourly resolution from the city of Vienna to detect outliers and to investigate if the final clusters show general similarities and similarities with urban land-use characteristics. It is shown that both the outlier detection and the implicit mapping to land-use characteristic is possible with AWT which opens new possible fields of application, specifically in the rapidly evolving field of urban climate and urban weather.
translated by 谷歌翻译
Social media and messaging apps have become major communication platforms. Multimedia contents promote improved user engagement and have thus become a very important communication tool. However, fake news and manipulated content can easily go viral, so, being able to verify the source of videos and images as well as to distinguish between native and downloaded content becomes essential. Most of the work performed so far on social media provenance has concentrated on images; in this paper, we propose a CNN architecture that analyzes video content to trace videos back to their social network of origin. The experiments demonstrate that stating platform provenance is possible for videos as well as images with very good accuracy.
translated by 谷歌翻译
Machine Learning models capable of handling the large datasets collected in the financial world can often become black boxes expensive to run. The quantum computing paradigm suggests new optimization techniques, that combined with classical algorithms, may deliver competitive, faster and more interpretable models. In this work we propose a quantum-enhanced machine learning solution for the prediction of credit rating downgrades, also known as fallen-angels forecasting in the financial risk management field. We implement this solution on a neutral atom Quantum Processing Unit with up to 60 qubits on a real-life dataset. We report competitive performances against the state-of-the-art Random Forest benchmark whilst our model achieves better interpretability and comparable training times. We examine how to improve performance in the near-term validating our ideas with Tensor Networks-based numerical simulations.
translated by 谷歌翻译
Soft robots are interesting examples of hyper-redundancy in robotics, however, the nonlinear continuous dynamics of these robots and the use of hyper-elastic and visco-elastic materials makes modeling of these robots more complicated. This study presents a geometric Inverse Kinematic (IK) model for trajectory tracking of multi-segment extensible soft robots, where, each segment of the soft actuator is geometrically approximated with multiple rigid links connected with rotary and prismatic joints. Using optimization methods, the desired configuration variables of the soft actuator for the desired end-effector positions are obtained. Also, the redundancy of the robot is applied for second task applications, such as tip angle control. The model's performance is investigated through simulations, numerical benchmarks, and experimental validations and results show lower computational costs and higher accuracy compared to most existing methods. The method is easy to apply to multi segment soft robots, both in 2D and 3D. As a case study, a fully 3D-printed soft robot manipulator is tested using a control unit and the model predictions show good agreement with the experimental results.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves the way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.
translated by 谷歌翻译
We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4\%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task.
translated by 谷歌翻译